NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dynamic Model Predictive Shielding for Provably Safe Reinforcement Learning

Banerjee, Arko; Rahmani, Kia; Biswas, Joydeep; Dillig, Isil (December 2024, Neurips 2024)

Among approaches for provably safe reinforcement learning, Model Predictive Shielding (MPS) has proven effective at complex tasks in continuous, high-dimensional state spaces, by leveraging a backup policy to ensure safety when the learned policy attempts to take risky actions. However, while MPS can ensure safety both during and after training, it often hinders task progress due to the conservative and task-oblivious nature of backup policies. This paper introduces Dynamic Model Predictive Shielding (DMPS), which optimizes reinforcement learning objectives while maintaining provable safety. DMPS employs a local planner to dynamically select safe recovery actions that maximize both short-term progress as well as long-term rewards. Crucially, the planner and the neural policy play a synergistic role in DMPS. When planning recovery actions for ensuring safety, the planner utilizes the neural policy to estimate long-term rewards, allowing it to observe beyond its short-term planning horizon. Conversely, the neural policy under training learns from the recovery plans proposed by the planner, converging to policies that are both high-performing and safe in practice. This approach guarantees safety during and after training, with bounded recovery regret that decreases exponentially with planning horizon depth. Experimental results demonstrate that DMPS converges to policies that rarely require shield interventions after training and achieve higher rewards compared to several state-of-the-art baselines
more » « less
Full Text Available
Dynamic Model Predictive Shielding for Provably Safe Reinforcement Learning

Banerjee, Arko; Rahmani, Kia; Biswas, Joydeep; Dillig, Isil (December 2024, Advances in Neural Information Processing Systems (Neurips))

Full Text Available
Programmatic Imitation Learning From Unlabeled and Noisy Demonstrations

https://doi.org/10.1109/LRA.2024.3385691

Xin, Jimmy; Zheng, Linus; Rahmani, Kia; Wei, Jiayi; Holtz, Jarrett; Dillig, Isil; Biswas, Joydeep (June 2024, IEEE Robotics and Automation Letters)

Full Text Available
Programming-by-Demonstration for Long-Horizon Robot Tasks

https://doi.org/10.1145/3632860

Patton, Noah; Rahmani, Kia; Missula, Meghana; Biswas, Joydeep; Dillig, Işıl (January 2024, Proceedings of the ACM on Programming Languages)

The goal ofprogrammatic Learning from Demonstration (LfD)is to learn a policy in a programming language that can be used to control a robot’s behavior from a set of user demonstrations. This paper presents a new programmatic LfD algorithm that targetslong-horizon robot taskswhich require synthesizing programs with complex control flow structures, including nested loops with multiple conditionals. Our proposed method first learns a program sketch that captures the target program’s control flow and then completes this sketch using an LLM-guided search procedure that incorporates a novel technique for proving unrealizability of programming-by-demonstration problems. We have implemented our approach in a new tool calledprolexand present the results of a comprehensive experimental evaluation on 120 benchmarks involving complex tasks and environments. We show that, given a 120 second time limit,prolexcan find a program consistent with the demonstrations in 80% of the cases. Furthermore, for 81% of the tasks for which a solution is returned,prolexis able to find the ground truth program with just one demonstration. In comparison, CVC5, a syntaxguided synthesis tool, is only able to solve 25% of the caseseven when given the ground truth program sketch, and an LLM-based approach, GPT-Synth, is unable to solve any of the tasks due to the environment complexity.
more » « less
Repairing serializability bugs in distributed database programs via automated schema refactoring

https://doi.org/10.1145/3453483.3454028

Rahmani, Kia; Nagar, Kartik; Delaware, Benjamin; Jagannathan, Suresh (June 2021, ACM Conference on Programming Language Design and Implementation)
null (Ed.)
Full Text Available
CLOTHO: directed test generation for weakly consistent database systems

https://doi.org/10.1145/3360543

Rahmani, Kia; Nagar, Kartik; Delaware, Benjamin; Jagannathan, Suresh (October 2019, Proceedings of the ACM on Programming Languages)

Relational database applications are notoriously difficult to test and debug. Concurrent execution of database transactions may violate complex structural invariants that constraint how changes to the contents of one (shared) table affect the contents of another. Simplifying the underlying concurrency model is one way to ameliorate the difficulty of understanding how concurrent accesses and updates can affect database state with respect to these sophisticated properties. Enforcing serializable execution of all transactions achieves this simplification, but it comes at a significant price in performance, especially at scale, where database state is often replicated to improve latency and availability. To address these challenges, this paper presents a novel testing framework for detecting serializability violations in (SQL) database-backed Java applications executing on weakly-consistent storage systems. We manifest our approach in a tool, CLOTHO, that combines a static analyzer and model checker to generate abstract executions, discover serializability violations in these executions, and translate them back into concrete test inputs suitable for deployment in a test environment. To the best of our knowledge, CLOTHO, is the first automated test generation facility for identifying serializability anomalies of Java applications intended to operate in geo-replicated distributed environments. An experimental evaluation on a set of industry-standard benchmarks demonstrates the utility of our approach.
more » « less

Search for: All records